Sequence Database Search Using Jumping Alignments

نویسندگان

  • Constantin Bannert
  • Marc Rehmsmeier
  • Rainer Spang
  • Jens Stoye
چکیده

We describe a new algorithm for amino acid sequence classification and the detection of remote homologues. The rationale is to exploit both vertical and horizontal information of a multiple alignment in a well balanced manner. This is in contrast to established methods like profiles and hidden Markov models which focus on vertical information as they model the columns of the alignment independently. In our setting, we want to select from a given database of "candidate sequences" those proteins that belong to a given superfamily. In order to do so, each candidate sequence is separately tested against a multiple alignment of the known members of the superfamily by means of a new jumping alignment algorithm. This algorithm is an extension of the Smith-Waterman algorithm and computes a local alignment of a single sequence and a multiple alignment. In contrast to traditional methods, however, this alignment is not based on a summary of the individual columns of the multiple alignment. Rather, the candidate sequence at each position is aligned to one sequence of the multiple alignment, called the "reference sequence". In addition, the reference sequence may change within the alignment, while each such jump is penalized. To evaluate the discriminative quality of the jumping alignment algorithm, we compared it to hidden Markov models on a subset of the SCOP database of protein domains. The discriminative quality was assessed by counting the number of false positives that ranked higher than the first true positive (FP-count). For moderate FP-counts above five, the number of successful searches with our method was considerably higher than with hidden Markov models.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Database Searching and BLAST Tuesday , October 27 th

The goal of a database search is to find all “high-scoring” local alignments (i.e, local alignments with a score above a given threshold) and to determine the significance of alignments found. A database search can be used to compare a protein or a cDNA sequence with genomic DNA, e.g. to find gene location or identify intron/exon boundaries. Another application is to find homologous protein seq...

متن کامل

Database Search Based on Bayesian Alignment

The size of protein sequence database is getting larger each day. One common challenge is to predict protein structures or functions of the sequences in databases. It is easy when a sequence shares direct similarity to a well-characterized protein. If there is no direct similarity, we have to rely on a third sequence or a model as intermediate to link two proteins together. We developed a new m...

متن کامل

Histone Sequence Database: a compilation of highly-conserved nucleoprotein sequences

By searching the current protein sequence databases using sequences from human and chicken histones H1/H5, H2A, H2B, H3 and H4, a database of aligned histone protein sequences with statistically significant sequence similarity to the search sequence was constructed. In addition, a nucleotide sequence database of the corresponding coding regions for these proteins has been assembled. The region ...

متن کامل

CDD: a database of conserved domain alignments with links to domain three-dimensional structure

The Conserved Domain Database (CDD) is a compilation of multiple sequence alignments representing protein domains conserved in molecular evolution. It has been populated with alignment data from the public collections Pfam and SMART, as well as with contributions from colleagues at NCBI. The current version of CDD (v.1.54) contains 3693 such models. CDD alignments are linked to protein sequence...

متن کامل

A Novel Approach to Remote Homology Detection: Jumping Alignments

We describe a new algorithm for protein classification and the detection of remote homologs. The rationale is to exploit both vertical and horizontal information of a multiple alignment in a well-balanced manner. This is in contrast to established methods such as profiles and profile hidden Markov models which focus on vertical information as they model the columns of the alignment independentl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Proceedings. International Conference on Intelligent Systems for Molecular Biology

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2000